feat: Add base64 and HTTP image URL support to vLLM workers #4114

KrishnanPrash · 2025-11-05T00:54:38Z

Overview

Enables vLLM backend workers to process base64 data URL images.

Backend workers now extract image URLs from PreprocessedRequest.multi_modal_data (added in #3733) and decode them using ImageLoader:

data:image/*;base64,<encoded> → Decoded to PIL.Image
http:// → Fetched and loaded as PIL.Image

Related PRS:

feat: Media URL passthrough in OAI preprocessor #3733: added multimodal URL passthrough in Rust preprocessor
feat: Media HTTP fetching and b64 decoding #3967, feat: Image decoder in the frontend #3971, feat: decoded media via NIXL #3988: This PRs support doing HTTP fetching or b64 decoding in the frontend worker.
- Once these PRs are merged, will need a follow-up PR that will replace the current decoding done in _extract_multimodal_data() with a NIXL read.

Details

Modified: components/src/dynamo/vllm/handlers.py

Added _extract_multimodal_data() method

Scripts:

Created agg_multimodal.sh - Standard deployment using Rust preprocessor
Renamed agg_multimodal.sh → agg_multimodal_epd.sh - EPD architecture (preserved)

Tests:

Renamed existing tests with _epd suffix (e.g., multimodal_agg_qwen → multimodal_agg_qwen_epd)
Added new multimodal_agg_qwen test using standard deployment
Validates both HTTP and base64 URLs passthrough.

Where should the reviewer start?

handlers.py:116-168 - Extraction logic
test_vllm.py:166-195 - Test validation

Summary by CodeRabbit

New Features
- Added multimodal image support for vLLM, enabling both URL-based and base64-encoded image inputs
- Introduced Encode-Prefill-Decode (EPD) deployment architecture for multimodal backends
- Added support for Qwen2.5-VL-7B-Instruct model with optimized GPU memory configuration
Tests
- Added comprehensive multimodal test cases with image URL and base64 encoding validation

Signed-off-by: Krishnan Prashanth <[email protected]>

coderabbitai · 2025-11-05T00:59:50Z

Walkthrough

This pull request adds multimodal image data support to the vLLM handler system. It introduces image loading and data extraction logic in the core handlers, updates deployment scripts to reflect a simplified or alternative EPD architecture for multimodal inference, and extends test configurations with image URLs and base64-encoded test data.

Changes

Cohort / File(s)	Summary
Core multimodal handler logic `components/src/dynamo/vllm/handlers.py`	Added `ImageLoader` attribute and `_extract_multimodal_data` method to `BaseWorkerHandler`; updated `DecodeWorkerHandler.generate` and `PrefillWorkerHandler.generate` to inject extracted multimodal data into `TokensPrompt`; includes error handling and future-proofing for video_url.
Deployment script refactor `examples/backends/vllm/launch/agg_multimodal.sh`	Simplified multimodal backend to single vLLM process with Dynamo frontend; replaced llava-hf/llava-1.5-7b-hf with Qwen/Qwen2.5-VL-7B-Instruct; removed prompt-template logic; added GPU memory optimization flags.
New EPD architecture script `examples/backends/vllm/launch/agg_multimodal_epd.sh`	New Bash script implementing 3-component Encode-Prefill-Decode multimodal backend; includes CLI argument parsing, dynamic GPU memory optimization per model, and orchestration of preprocessor and worker processes.
Test configuration updates `tests/serve/test_vllm.py`	Added module constants `BUS_IMAGE_URL` and `BUS_IMAGE_B64`; introduced `stragglers` field to `VLLMConfig`; renamed and reconfigured multimodal_agg_llava to multimodal_agg_llava_epd; extended multimodal_agg_qwen with base64 data URL test payload; added new multimodal_agg_qwen_epd config entry.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

components/src/dynamo/vllm/handlers.py: Review image loading initialization, _extract_multimodal_data logic, and correct injection points in both DecodeWorkerHandler.generate and PrefillWorkerHandler.generate; verify error handling and type compatibility with vLLM's multimodal data format.
examples/backends/vllm/launch/agg_multimodal_epd.sh: Validate GPU binding, worker orchestration, and GPU memory argument construction for model-specific optimization.
tests/serve/test_vllm.py: Confirm image URL and base64 payload formats; verify test expectations align with new configuration structure.

Poem

🐰 Behold, images now hop through the pipeline flow,
From URLs and base64, the handlers make them glow,
Multimodal dreams extracted, injected with care,
With EPD workers dancing through the data air! 📸

Pre-merge checks

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description covers all required template sections with sufficient detail: Overview explains the feature, Details specify modified files and scripts, and Where to start provides exact line numbers for reviewer guidance.
Title check	✅ Passed	The title 'Add base64 and HTTP image URL support to vLLM workers' directly and accurately summarizes the main change: enabling vLLM workers to process base64 data URL images and HTTP image URLs through the new _extract_multimodal_data() method.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83a3fe4 and ea4a792.

⛔ Files ignored due to path filters (1)

lib/bindings/python/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

components/src/dynamo/vllm/handlers.py (5 hunks)
examples/backends/vllm/launch/agg_multimodal.sh (3 hunks)
examples/backends/vllm/launch/agg_multimodal_epd.sh (1 hunks)
tests/serve/test_vllm.py (5 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-09-16T19:47:30.312Z

Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

Applied to files:

components/src/dynamo/vllm/handlers.py

📚 Learning: 2025-10-28T04:09:48.264Z

Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.

Applied to files:

components/src/dynamo/vllm/handlers.py

🧬 Code graph analysis (1)

tests/serve/test_vllm.py (1)

tests/utils/payload_builder.py (1)

chat_payload (81-108)

🪛 Ruff (0.14.3)

components/src/dynamo/vllm/handlers.py

149-149: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

tests/serve/test_vllm.py

31-31: Probable use of requests call without timeout

(S113)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: trtllm (amd64)
GitHub Check: operator (amd64)
GitHub Check: Build and Test - dynamo

tests/serve/test_vllm.py

milesial

LGTM

components/src/dynamo/vllm/handlers.py

Signed-off-by: Krishnan Prashanth <[email protected]>

indrajit96

Nice work with E2E PR!
Left a few questions and minor comments.
LGTM!

components/src/dynamo/vllm/handlers.py

krishung5

lgtm!

Signed-off-by: Krishnan Prashanth <[email protected]>

rmccorm4

Please remove test_multimodal.sh from the root of the repo. It shouldn't be at top level. It should either be removed or put in some test utils type of folder.

KrishnanPrash · 2025-11-05T19:05:07Z

Please remove test_multimodal.sh from the root of the repo. It shouldn't be at top level. It should either be removed or put in some test utils type of folder.

Added for ease of cluster testing. Will clean up before merging.

Signed-off-by: Krishnan Prashanth <[email protected]>

addressed

…o#4114) Signed-off-by: Krishnan Prashanth <[email protected]>

rmccorm4 · 2025-11-08T02:58:01Z

examples/backends/vllm/launch/agg_multimodal_epd.sh

+fi
+
+# Start processor (Python-based preprocessing, handles prompt templating)
+python -m dynamo.vllm --multimodal-processor --model $MODEL_NAME --mm-prompt-template "$PROMPT_TEMPLATE" &


Question for next week - if the worker/backend supports PreprocessedRequest.multi_modal_data now, do we need this multimodal-preprocessor that explicitly registers as expecting ModelInput.Text so it can do the processing itself?

ref:

dynamo/components/src/dynamo/vllm/main.py

Line 555 in 51c4fe6

ModelInput.Text, # Custom processor is used and this type bypasses SDK processor

KrishnanPrash added 2 commits November 4, 2025 11:23

Working version 1

43a9119

Signed-off-by: Krishnan Prashanth <[email protected]>

Updated Testing and Scripts

ea4a792

Signed-off-by: Krishnan Prashanth <[email protected]>

KrishnanPrash requested review from a team as code owners November 5, 2025 00:54

pull-request-size bot added the size/L label Nov 5, 2025

KrishnanPrash requested review from indrajit96, krishung5, milesial and rmccorm4 November 5, 2025 00:54

KrishnanPrash changed the title ~~Add base64 and HTTP image URL support to vLLM workers~~ feat: Add base64 and HTTP image URL support to vLLM workers Nov 5, 2025

github-actions bot added the feat label Nov 5, 2025

coderabbitai bot reviewed Nov 5, 2025

View reviewed changes

tests/serve/test_vllm.py Outdated Show resolved Hide resolved

milesial approved these changes Nov 5, 2025

View reviewed changes

components/src/dynamo/vllm/handlers.py Outdated Show resolved Hide resolved

Cleaning up comments

0e87768

Signed-off-by: Krishnan Prashanth <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB November 5, 2025 01:12 Inactive

Address Comment

1c718ac

Signed-off-by: Krishnan Prashanth <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB November 5, 2025 01:19 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 5, 2025 01:20 Inactive

indrajit96 approved these changes Nov 5, 2025

View reviewed changes

components/src/dynamo/vllm/handlers.py Outdated Show resolved Hide resolved

components/src/dynamo/vllm/handlers.py Show resolved Hide resolved

components/src/dynamo/vllm/handlers.py Show resolved Hide resolved

components/src/dynamo/vllm/handlers.py Show resolved Hide resolved

indrajit96 approved these changes Nov 5, 2025

View reviewed changes

components/src/dynamo/vllm/handlers.py Outdated Show resolved Hide resolved

components/src/dynamo/vllm/handlers.py Show resolved Hide resolved

components/src/dynamo/vllm/handlers.py Show resolved Hide resolved

krishung5 approved these changes Nov 5, 2025

View reviewed changes

Addressing Comments: Add Constants, Remove Network Fetch for b64 Image

bc2bc37

Signed-off-by: Krishnan Prashanth <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB November 5, 2025 17:28 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 5, 2025 17:29 Inactive

Temp testing script

4a03257

Signed-off-by: Krishnan Prashanth <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB November 5, 2025 18:33 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 5, 2025 18:34 Inactive

rmccorm4 previously requested changes Nov 5, 2025

View reviewed changes

Removed temp test files

59cf268

Signed-off-by: Krishnan Prashanth <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB November 5, 2025 19:12 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 5, 2025 19:14 Inactive

KrishnanPrash requested a review from rmccorm4 November 5, 2025 20:41

pvijayakrish approved these changes Nov 5, 2025

View reviewed changes

KrishnanPrash merged commit 25fc732 into main Nov 6, 2025
66 of 84 checks passed

KrishnanPrash deleted the kprashanth/vllm-b64-img branch November 6, 2025 00:49

pull bot pushed a commit to saidrhs/dynamo that referenced this pull request Nov 6, 2025

feat: Add base64 and HTTP image URL support to vLLM workers (ai-dynam…

b73c571

…o#4114) Signed-off-by: Krishnan Prashanth <[email protected]>

rmccorm4 added multimodal backend::vllm Relates to the vllm backend labels Nov 6, 2025

rmccorm4 reviewed Nov 8, 2025

View reviewed changes

feat: Add base64 and HTTP image URL support to vLLM workers #4114

feat: Add base64 and HTTP image URL support to vLLM workers #4114

Conversation

KrishnanPrash commented Nov 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Details

Where should the reviewer start?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

milesial left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

indrajit96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

krishung5 left a comment

Choose a reason for hiding this comment

Uh oh!

rmccorm4 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KrishnanPrash commented Nov 5, 2025

Uh oh!

Uh oh!

rmccorm4 Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

KrishnanPrash commented Nov 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 5, 2025 •

edited

Loading

rmccorm4 left a comment •

edited

Loading